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[57] ABSTRACT 

A method of responding to^verload*in a real time control 
system. Overload is measured through the use of a control 
parameter such-as the occupancy- of a control processor or 
me number of entries in a queue of a module of the system. 
The overload indication is reduced to one _o f-a-- plurality of 
levels. The levels corres ponding to a- longer term-mqre 
serious overload are based on control parameter measure- 
ments over a longer period o f time than the less serious short 
term overload levels. With autonomous. control, each mod- 
ule of the system determines its own overload level and 
performs overl oad control actions corresponding to that 
level. In integrated system overload control, a centralized 
processor receives overlo ad indications from e ach of the 
modules of the system ah~3 reque sts an approp riate overload 
control action of each module. Advantageously, these 
arrangements allow t he system and it's modules to respond to 
overlpjd^m oreL rapi dly and to return to normal operation 
more rapidly. " ' " - ~^ 
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INTEGRATED OVERLOAD CONTROL FOR FIGS. 5 and 6 illustrate how modules respond to indica- 

OVERLOAD CONTROL FOR DISTRIBUTED tion of their own overflow plus those in other modules. 

REAL TIME SYSTEMS 

DETAILED DESCRIPTION 
RELATED APPLICATION 5 In this document, we describe an overload control con-/ 

This application is related to G. Gehi, S. Lin and S. sisting of two inter-related mechanisms: short-terrnand lonfif 



Sohraby: "Autonomous Overload for Distributed Real Time icnD u In man y Sltu f^ns, short-term control actions have 
Systems", which application is being filed simultaneously smaller impact on the system performance while long-term 
with this application and is being assigned to the same in actions have mor^scvercper^ Thus, 
assignee as this application. ~ 10 lt * **mbk to dis^uisTbetween inort-term and long- 

term. In particular, it is important that a unified control 
TECHNICAL FIELD method that does not disti nguishte tween the two should not 

be applied to the system underove3oa3rfhe term.overload 



This invention relates to method and apparatus for state will be used to include the zero overload stat e, i.e., the 
responding to overload conditions in real time systems. 15 normal state wherein there is no overload in a module. 

ProD ^ em In the autonomous control method described first, indi- 

Real time systems, such as telecomm unication systems vidual switch components (nodes or modules) measure their 

must respond to requests such as requests for connections performance and take action based on these measurements, 

made by users, who cannot be controlled by the real time All nodes are autonomous and independent and their action 
system. Accordingly, these systems are subject to overload 20 may not nave relevance to the states and/or actions of other 

if an unusually large fraction of the users make simultaneous nodes . In contrast, in the "integrated" overload approach 

demands on the system. In general in prior art arrangements, used, in one embodiment, when one or more modules enter 

the systems respond to the overload conditions by deferring a long-term overload level, the states of all nodes are 

some deferable work and by shedding load, i.e., refusing to simultaneously considered in determining the course of 
accept some fraction of the users' request for connections in 25 act { on during the control 

one or more modules of the system A problem ofthe prior ^ „ the name ■ ^ k m reaction 

art * that . has been found difficult to restore performance to measurements of system performance over short periods 

of deferrable Usks as soon as, m retrospect, .1 might have of ^ -phis is intended to capture the transient andshort- 

been possiMe to re-start these tasks, and to reassume accep- tem wtAoid momoos uat not ^ , lasting ^ 

Sailed " * m Kti0SpeCt - SUCh load mght ^ 30 thus, may not require actions that result in heavy penalties 

Solution 

The above problems are significantly alleviated and a^ 
advance is made over the teachings of the prior art in 
accordance with applicants' invention wherein individual 
modules of a distributed system perform overload control 
actions as a function of the present overload indications of 
several or all modules of the system. A centralized processor 
receives load indications (module states) from each module 
and assigns a control action to each module. In applicants' r 



35 



(e.g., high call blocking rates.) Typical short-term control 
may consist of deferring processing of non-critical tasks; 
these tasks will be processed after the transient overload has 
disappeared. The severity of the action in this case is not 
critical and is transient 

For example, a non-critical task may be to respond to a 
request to re-set registers and memory blocks that are 
allocated to switch maintenance and administration and have 
no immediate implications to the main tasks such as pro- 

r^fe7edTml^ the!£nd£ oil ™ pr0grCSS ™ me li *™P°«™ { to note 

the short term or long term overload level, and the control mat j^ °° n ^cal tasks cannot be deferred for a long 

action is the action for one of these levels, though not ^ of Jf\ e ^ cmsG m t mat ^ration of Ae switch 

necessarily the same level as the level describing the present/ °' thc netwo ^ ma 7 d^pted Short-term 

load or overload level of the module. Advantageously, such 45 actions may also mchide deferring a critical tadc performed 

an arrangement permits the system to respond in a near mc ^ td \ 0I f * ^ seventy of the condition implies 

optimum fashion within each module to indications of mat mo ^ OTtlcal acUoa 15 *> r exam P le > a 

overload within that module and overload in order modules! f""* ^rt-term action may be to drop to a low rate the 

In one embodiment, the system-wide approach is ontf ^wer Pnority caU signaling messages that arrive to the 

invoked if at least one of the modules is in a long tenj 50 h ^ ^eheve the switch nodes of the processing load 

overload level. Advantageously, short term stations arl mat WOuld 0therwiSe hrou « te m b * me blocked caUs ' 

handled locally, i.e., within a moffi; but the more serious\ Under lon g- term control, deferring tasks (such as non- 

lonrtelmt^ rloads, which ma y i^uire^hTaSstance of) critkal processing) cannot help by itself. More severe 

special acuorislnother moAde^lSTnafflled on a system- action such ^ blocking incoming signals to the system along 

wide basis. 55 w * m deferring non-critical tasks may be necessary. The 

blocking rate in this case is higher than those that may have 

BRIEF DESCRIPTION OF THE DRAWING 06611 applied in the short-term case. 

. , An objective to the method described herein is to distin- 

FIG. 1 is a block diagram showing a plurality of modules between these two cases and to devise a specific 

in an exemplary switching system; ^ method for measurement and control in each case. 

FIG. 2 is an overall flow diagram illustrating the process i D the following description, the specific node perfor- 

of deteraining overload levels and autonomously respond- mance measure that is monitored is processor utilization, 

ing to that overload within a module; However, this measure is considered here only as a preferred 

FIGS. 3 and 4 are flow diagrams illustrating the process exemplary embodiment. Other measures such as queue 

of autonomously initiating overflow response actions chang- 65 length, buffer usage utilization, number of busy trunks in the 

ing the level of the overflow and switching between long fabric controllers, etc., can be used. Such choices depend on 

term and short term overflow levels; and the specific implementations, switch architecture, and also 
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on the overall switch performance measures. In this used in reaching a short-term control decision. Similarly, 

description, the term "load" will be used as the represents- W(L) represent the number of consecutive T intervals that 

tive measured control parameter. are used in arriving at a long-term control action. X(n) 

Node processor utilization is measured and monitored as represents the measured utilization over interval (n). The 

the representative control parameter that is an indicator of a 5 measure that is used for short-term control during interval 

node overload state. This indicator is then mapped to the (n) is represented as S(n). This measure is a filtered version 

appropriate level of overload and a corresponding control of X(n) such as from the following expression: 
action is applied. Described hereinafter is the whole process 

starting with how measurements are performed ( short-term J(«)^(l)x(n)+«(2)an(»-i)+ . . . nt(w{S)yx{n+i-w{S)) 

and long-term). Processor utilizauon corresponds to the 10 whcre> ^ ^ me (smoothing) factor applied for the 

percentage of time that a processor (such as the Central measured utilization over the j'th interval from the most 

Processing Unit (CPU) of a node m a switch is busy. In order rcccnt measurement, and a(l) is the smoothing factor for the 

to arrive at these measures, processor busy periods are most recent measurement interval S(n) when measured at a 

accumulated and divided by the total length of the measure- gi ven ume mterval ( n \ reflects me smoothed value of the 

ment mterval. This ration is called processor utilization. In 15 utilizatioil over the past W(S) consecutive intervals. S(n) is 

the preferred embodiment, a filtered measure (using, for up d at ed at each interval, by discarding the earliest measured 

example, an exponential smoothing technique) of several value> x ( n -W(S)), and from the sample and shifting by one 

measurement intervals is used If the measured ratio is less me smoothing coefficients applied to the measurements. 

5SS ^^ pU J >1 ? m .° dcr * te . lcvel (dually around Then, a new measurement sample is added to the set of W(S) 

40%-50%) the node is said to be in normal state and usually ^ measurements 

no overload action is required unless other modules are in an Similarly, the long term measured value at interval (n), 

overload state, as diseased hereir*fter. When the ratio ^ ^ determined . 11je window size f or L( n ) is W(L), 

reaches levels of 80%-90% or higher, the module is likely wherein W(L)>W(S) 

to be in severe overload condition and an action should be process 0 f monitoring and control works as follows: 

a^hed. In between the ^ ^ We measure md ^ore values of S(n) and L(n) for each 

decide to take actions that are not as severe as at the 90% interval n. At the beginning of each interval, S(n) is com- 

level. However, no action is also not appropriate. pared against two thresholds X(min4) and X(max,i), where 

In order to illustrate the difference between short-term and j represents the present short term overload "level" of the 

long-term, suppose that in an uncontrolled switch over switch module. For example, in normal operation, when a 

intervals of say 100 ms (milliseconds) the particular node 30 module is started, the overload level is the no overload level, 

processor that is monitored demonstrates the following i^. The thresholds are chosen such that they reflect a switch 

measurement results, hereafter called control parameters, behavior that is acceptable for the particular level (i). If the 

over three consecutive intervals in two separate measure- measured value S(n) exceeds the X(max,i) at this level, then 

ments (referred to as Cases A and B): me i eve i & changed be at level (i+1) over the upcoming 

Case (A): 80%, 40%, 50%. 35 interval. Similarly, if the measured value S(n) is below 

Case (B): 80%, 75%, 90%. X(min,i), then the level is changed to be at level (i-1) in the 

In Case (A), although the node is in 80% utilization over upcoming interval. When S(n) falls in a given level (i) (that 

the first 100 ms interval, it is not as highly utilized in the next is S(n) is between X(min,i) and X(max,i)) it is not necessary 

2 intervals. This is not the case in Case (B) where although to check the long term measured parameter (L(n)). In this 

the utilization is equally high in the first interval, it remains 40 case, a decision has been reached as to the state of the node 

high in the next 2 intervals as well. We refer to Case (A) as overload and the appropriate controls for level (i) are applied 

one possibly requiring a short-term control action while in (This would consist of short-term overload control actions 

(B) a longer term action is needed Initially, the actions over designed for level (i)). However, if the measured value S(n) 

the first 100 ms interval in both Cases A and B are the same; has exceeded the highest level after step-wise exceeding the 

however, as measurement of next intervals become 45 next higher threshold in each level, then the node has passed 

available, in Case (A) we reset the short term control and the short-term overload level and is now in the long-term 

resume normal operation while in Case (B) we maintain the overload. In this case, the measured parameter L(n) is 

previous control (or increase the severity of control action). compared against similar long-term overload thresholds and 

By the time measurements of the third interval become appropriate controls are applied. 

available, Case (A) demonstrates a normally operating mod- 50 Note that at each step, the node can only step up or down 1 

ule while Case (B) may imply the need for even more severe from its existing state level by 1 . When state of a node drops! 

action such as blocking incoming signals at a higher rate below the lowest long-term overload, then it enters the 

than initially envisaged. The severity of action (such as the highest short-term overload leveL Similarly, when it exceeds, 

rate of blocking of incoming signals) depends on the specific the highest short-term overload level it enters the lowest 

module's overall performance. For example, if the overall 55 long-term overload level. When a system reaches the highest 

performance measure of the module is so bad in Case (B) long-term level, it remains in that level; when the systemtBe 

during the third interval that the utilization should be state of the node drops below the lowest short term ! overload, 

brought down to a target utilization of say, 65%, then a it exits the overload status altogether, 

higher rate of blocking is needed than if the target utilization FIG. 1 illustrates the pertinent aspects of the architecture 

is 75%. 60 of a switch 1 on which applicants* invention is imp lemented 

The notation and concepts in this section are similar to In one preferred embodiment. Hie switch comprises a plu- , 

those of the flow charts. Time intervals are numbered by (n). rality of message processors (MP) 10, . . . , 11 interconnected j 

These intervals in the above example were assumed to be by a bus system 15. The message processors communicate? 

T=»100 ms in length. In addition to observing the results of wit h each other and outside switches to^ process mess ages 

utilization measurement over a T=100 ms interval, we also 65 representing connection and disconnec tion reg uests. The 

observe these measurements over a window of size W(S), message processors communicate over bus system 15 with* 

representing the number of consecutive intervals T that are a plurality of control processors 20, ...» 21 for controlling^ 
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a plurality of network modules 30, ... , 31. The control 
processors and network modules communicate with each 
other over bus system 25r The message processor, control 
processor and network modules each contain a* program 
controlled processor system for carrying out the function 
required of the processor and for executing the overload^ 
control programs described farther herein. NH 

FIG. 2 illustrates at a high level the process of determin- 
ing an overload state in one of the modules. Action Block 
201 indicates that a short term overload measurement inter- 
val T(S) for measuring indications of short term overload is 
selected. The measurement interval T(S) is selected when 
the system is initialized. In applicants' preferred 
embodiment, T(L), the measurement interval for long term 
overload, is the same as T(S), but more intervals are used for 
control and smoothing. In alternative embodiments the 
interval can be dynamically selected. In Action Block 203 a 
window W(S) for observing a number of consecutive mea- 
surement for determining a short term overload is selected 
(Action Block 203). This window is also selected at initial- 
ization time. In Action block 205, a window W(L) for 
observing a number of consecutive overload measurements 
for determining the level of a long term overload is selected. 
Again, W(L) is selected at initialization time. Atypical value 
for a high capacity switch with high capacity modules might 25 
be T(S)=0.1 second, W(S)«3, and W(L)«10. With these 
numbers long and short term overload measurements are 
made every 0.1 second and are filtered over a period of 0.3 
seconds for monitoring the short term overload state and 
over a period of 1 second for changing the long term 
overload state. 

It is possible to measure different system performance 
measures, or stimulus parameters for detecting the short- and 
long-term overload states. In this case, T(S) and T(L) 
(corresponding to the short- and long-term measurement 
intervals, respectively), can be chosen independently as well 
as for different control parameters. 

The control parameter X(n) for the nth interval is mea- 
sured for each interval and the past W(L) measurements are 
stored (Action block 207). These values are then used in 
Action block 209 to determine a filtered version of the 
measurements. Based on the measured control parameter 
determined in Action block 209, the presence of overload is 
detected; if overload had previously been detected, the 
overload level is adjusted in accordance with the teachings 
of FIGS. 3 and 4. 

In applicants' preferred embodiment, the highest level of 
short term overload control leads to the lowest level of long 
term overload if the overload increases, and the lowest level 
of long term overload control leads to the highest level of 50 
short term overload if the overload decreases. Action blocks 
207, 209 and 211 are performed in real time in a working 
system. The control parameter is measured over an interval 
in the interval number representing essentially the time, for 
example, the time since the system was initialized. During 
the interval, the control parameter measurement of the load 
is X(n). Based on the present and several past values of the 
control parameter, S(n) or L(n) for short term and long term 
load, respectively, is calculated. A typical formula for cal- 
culating S(n) is 
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S(ny*(l)X(nyHi(2)X(n-l)Hi(3)X{n--2). 



With this filtering function, only the three most recent values 
of the load measurement are used. In this typical example 
a(l) equals 05, a(2) equals 0.4, and a(3) equals 0.1. For the 65 
case of calculating the long term control parameter L(n), the 
same techniques are used except that a series of ten mea- 



surements are used to create the filtered long term load and 
a series of ten coefficients are used for the filtering functions. 
The ten coefficients b(l), b(2), . . . , b(10) in one embodiment 
have the values 0.23, 0.17, 0.11, 0.1, 0.09, 08, 07, 0.06, 0.05 
and 0.04. 

While in the above example, the filtering is simply a linear 
addition of weighted control parameters from the present 
and previous intervals, other filtering methods such as 
exponential smoothing can be used. The process of filtering 
such data is well understood by those of ordinary skill in the 
art. 

After the states levels of the various processors and 
fabrics are determined, there are two possibilities in apply- 
ing control. They are: 

1. Autonomous Action by Each Node in the Switch: 

In this case, each processor or fabric takes independent 
action. This is one action corresponding to each over- 
load level. This type of control generally results in 
sub-optimal performance. However, it may be less 
costly than the other approaches because there is no 
need for a central controller. Reliability of the central 
controller in this case will not be an issue. Also the cost 
of collecting and processing state information of many 
modules in order to be able to define an overall system 
view is avoided. 

2. Integrated (Overall) System State Approach: 

Since switch nodes are nowadays more reliable and 
processing cost is rapidly declining, it is feasible to 
assume that the individual nodes' states can be deter- 
mined and combined in a central processor or in each 
processor, in order to create an "overall" system state. 
Actions in this integrated case are based on the overall 
observed system state, rather than the individual node 
states. A distributed switch with integrated control has 
overload performance that is superior to the autono- 
mous overload control type. 
Because long term overload is a more serious condition, 
one arrangement is to use autonomous action as long as none 
of the modules are in long term overload, and to switch to 
integrated action as soon as any module is in a long term 
overload state. Clearly, such a switch can also be made 
according to some other criteria based on the overload state 
of the modules. 

In many circumstances, different processors, or fabrics in 
a distributed switch, may be heterogeneous; thus, it is 
advantageous to choose different measurement intervals for 
individual processors and fabrics. For example, a longer 
measurement interval may be chosen for slower processors 
and fabrics, while a shorter measurement interval is appro- 
priate for faster ones. However, all such measurements 
should be mapped to a set of states which will ultimately be 
used to decide on the overload actions. Also, the measured 
parameters for each can be different. For example, while 
buffer size may be a proper measure in deciding the overload 
state of a message processor (e.g., MP 10) in a switch, in the 
case of a bufferless fabric (e.g. network module 31), this 
measurement is not available and thus cannot be applied. In 
the latter case, a different measure such as the number of 
existing calls, or the number of outstanding call requests, or 
other fabric-specific measures may be more appropriate for 
measurement and control. 

Similarly, the window sizes for short-term and long-term 
measurements may be chosen differently. 

Furthermore, based on the past measurement values it 
may be appropriate that the future measurement intervals be 
adjusted. Dynamic adjustment of measurement intervals 
may also occur as a result of time-of-day or actual measure- 
ments of control parameters at different fabrics and/or 
processors. 
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FIGS. 3 and 4 illustrate the process of entering an 
overload level, changing overload control actions within the 
short term or long term overload control system, and going 
between the short term and long term overload control. FIG. 
3 is entered in Action block 301. One of the allowable 5 
current levels of overload control i is the level 0 representing 
no overload. This is the state which is entered when the 
system is initialized. In Block 301, the short term overload 
measure, S(n) is calculated. This short term control param- 
eter is compared with the maximum value of control param- 10 
eter associated with the present level i of short term overload 
state (test 303). If the load does not exceed that threshold, 
then test 305 is used to determine whether the control 
parameter is now below the minimum threshold for level i. 



8 



decremented and Action Block 321 is re-entered to calculate 
the next value of L(n). 

The result of executing the program specified by the flow 
diagrams of FIGS. 3 and 4 is that in case of excessive load, 
a short-term overload level is entered which is gradually 
incremented if the filtered short term-load exceeds various 
thresholds; if the highest threshold is exceeded, then a long 
term overload level is entered and the long-term overload 
level is incremented if the filtered long term load exceeds the 
various long term overload thresholds. 

Similarly, a decrease in the load allows the overload level 
to be decremented and allows for an escape from a long-term 
overload level to a short-term overload level. 

In this part of the description, control actions 



If it is not, i.e., if the results of both tests 303 and 305 are 15 (autonomous control) are confined to the module having 
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negative, it is an indication that the control parameter is 
within the level of the current overload and the module 
remains in that level (Action Block 307). Action Block 301 
is then re-entered to calculate the next filtered control 
parameter sample. 

If the result of test 303 is positive, i.e., if the control 
parameter exceeds the maximum threshold for the present 
level, test 311 is used to determine whether the highest level 
of short term overload has been exceeded. If not, then the 
overload level is incremented to the next higher short term 
overload level and Action Block 301 is re-entered to await 
the next filtered load value. Similarly, if the result of test 305 
is positive, i.e., if the load is less than the bottom threshold 
of the present overload level, then that level is decremented 
(Action Block 315), and action 301 is re-entered to calculate 30 
the next filtered short term control parameter value. (Clearly 
there is no minimum threshold for the case of i*=0 repre- 
senting the absence of overload so that a positive result of 
test 305 means that a lower level, possibly the i equals 0 
level, exists.) 35 

If the result of test 311 is positive, i.e., if the maximum 
threshold for the highest level of short term overload has 
been exceeded, then Action block 321 of FIG. 4 is entered. 
When Action block 321 in entered from test 311, the initial 
value of long term overload control j is 1 (Action block 312, 40 
FIG. 3). (For simplicity in explaining FIGS. 2 and 3, it is 
assumed that j=l is the lowest long term overload state. In 
practice, it is more likely that the lowest value of j is one 
higher than the highest value of i). Action Block 321 
calculates the long term filtering of load to determine if the 45 
load is within the present level of long term overload. Test 
323 is used to determine if the present load exceeds the 
maximum threshold for the present level of long term 
overload control. If not, then test 325 is used to determine 
whether the control parameter is less than the minimum 50 
threshold for the present level of long term overload control. 
If the results of both tests 323 and 325 are negative, then 
Action block 327 is entered which signifies that the overload 
control remains at the present level and Action block 321 is 
re-entered. 

If Action block 323 indicates that the maximum threshold 
for the present level has been exceeded then the overload 
control level is incremented (action block 329) and Action 
block 321 is re-entered. 
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either short-term or long-term overload, and the control 
action for a module corresponds to that module's overload 
level. However, it is sometimes desirable to alter overload 
controls based on the simultaneous states of several or all 
modules, (integrated control). For example, consider a 
system, which has a single MP10 and a single CP 20. If both 
the MP and the QP are in a normal state or a short-term-^ 
overload state then the normal or short-term overload con- 
trols are invoked within each module. However, if either^ ^ 
module is in one of two long-term overload levels, then the - g v 
following actions are performed, which actions result is 
superior overload perfdrmance than inlhe^autonomouscase. ^ 

1. Call processor in low long-term overload, message ^ 
processor in no long-term overload: message processor ; 
blocks new incoming calls with probability P(jQ. call 
processor notifies external switches to throttle traffic at 
throtde level 1. ^ 

2. Call processor in high short-term overload state, mes- 
sage processor not in long-term overload state: message 
processor blocks new incoming calls with probability 
P(2), which is greater than P(l). Call processor notifies k 
external switches to throttle traffic further (throttle level 
2). 

3. Message processor in long-term overload state, callj^ 
processor not in long-term overload state: message | 
processor defers overhead message processing and ; 
blocks new incoming calls with probability P(3), where, i 
P(3)>P(2). Call processor reduces its overhead message | 
processing. 

4. Call processor in lower lon^term overload state, 
message processor in long-term overload state: mes-; ! 
sage processor defers overhead message processing 
and blocks new incoming calls with probability with ] 
P(4), where P(4)>P(3). Call processor notifies its 
incoming switches to throtde traffic at throttle level 1. 

5. Call processor in high, long-term overload state, mes- 
sage processor m long-term overlbad state: message 
processor defers its overhead message processing and 
blocks new incoming calls with probability P(5), which 
is greater than P(4). Call processor notifies its origi- 
nating switches to throttle traffic at throttle level 2. 

In this example, we have not considered actions to bal- 
ance the load more equitably among similar processors. It is 



V 



If the present filtered load is less than the minimum 60 assumed that the techniques, which are well known in the 
threshold for the present long term overload state, (positive prior art, are used to balance the load among similar pro- 
result of test 325), then test 331 is used to determine whether cessors so that in general, all similar processors are in the 
j is already at the level 1. If so, j is decremented to 0, and same or nearly the same overload levels. To the extent that 
Action block 301 is re-entered with i at its highest level these are not the same, the system response can be modified 
(Action Block 332). If the result of test 331 is negative, i.e., 65 so that a lower level of system overload control is invoked 
that the long term overload level can still be decremented if only some of the processors of one kind are in the 
without going to a short term overload level, then j is long-term overload level of the most heavily overloaded 
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processor of a particular type. For example, if there were two operations, administration and maintenance (O, A and M) 

message processors in the above example, only one of which processing in M(2) (referred to as action "1" in the 

is in a long-term overload level, then the system overload example), and "do nothing" in C(l). This Action A«=(2, 1, 0) 

response associated with the fourth overload category is in contrast to the action when S«{2, 1, 1) in the table. In 

(throttle level 1 and new incoming calls blocked with 5 this latter case, M(l) is in the lowest long term overload, 

probability P(4)) is desirable. while M(2) and C(l) are found in state "1" which corre- 

FIG. 5 illustrates one embodiment of the integrated sys- sponds t0 state for mo dules. In this 

tern .state ^approach. A centralized processor, assignable at case, A«=(2, 2, 1) corresponding to 5% blocking at M(l), 5% 

miuahzauot 1 fame ; and reassigned m case of trouble, collects at M(2) and deferrin 0 , A and M in C. Note that when 

the overload state of each module of the system. TTus m ^ m heteroge ^ ous> state « r> or « r> or . . . may 

processor then denves, either through application of a semi- . , . 6 , , ' , ' . ' u ^„ . J 

Markov decision process or omer heuristic, or through a °^ oad md *? lons 1 > 2 .* etc /> 

pre-stored vector, a vector specifying for each module the mav ^ m^y diffcreot actions For example in the 

appropriate response action . A centralized processor collects example above, if M(2) is twice as fast as M(l), S(2>»1 can 

the state of each module of a system (Action Block 501). mean a lai S er buffer size than SU1 - Similarly A(2>»2 can 

The centralized processor then calculates an action vector 15 mean say 2% blocking, while A(l)=2 can mean 5% 

specifying the action sate to be executed in each module of blocking, etc. 

the system (Action Block 503). Hie modules are then More generally, an optimum arrangement for controlling 
notified of the action state for the next interval. This process overload is one wherein one processor continuously moni- 
is repeated at regular intervals. tors the load state of each module of the system, and based 
Suppose a switching system consists of 3 modules, Ml, 20 on the state of all these modules, selectively applies overload 
M2 and CI. Each module has several thresholds used in control to a appropriate ones of these modules. Such an 
determining their respective states. As discussed earlier, arrangement may require a complex calculation to determine 
these thresholds are levels of buffer size, queue size, pro- an optimum response in each module. Compromises to the 
cessor initialization, etc., that are considered critical for ideal model can be made in a number of ways. Instead of 
overload measure. Suppose S=(S(1), S(2), S(3)) represents 25 11R j ng a continuous measurement of load, the load measure- 
system state at some interval of observation. Here, S(l) is ment may be provided at discreet intervals such as the 
the state (e.g., occupied buffer len^ occupied queue measurement mterval for me autonomous overload control 
WrfTT^^iW ( V' ? w case describcd ab °ve. Further, instead of using exact values 

iSkR J^n SttJ ffoU^^r DP of load > rac OVerload level as described above, can be used 

& * 30 to describe the load characteristics of each module. A further 

compromise may be invoked for certain systems wherein 
system control is only invoked if at least one module is in the 
long term overload state. A still further compromise may be 
the use of only the selected overload control levels as the 

35 overall load control responses which the control processor 
can impose on individual modules. 

FIG. 5 is a general method for the use of an mtegrated^l 
system approach toward overload control. An exemplarW\ 
system is the one shown in FIG. 2 which has_ three groups J \ 

40 of processors MP10, . . . MP11; CP 20, . . . CP 21; andl j 

When the system state S-{0, 0, 0), which corresponds to N 30, ... N 31. It is assumed that MP10, MPU, and;/ 

no overload in any of the modules, A=(0, 0, 0% correspond- CP 20, .... CP 21, are homogenous groups of processors ] I 

ing to no action at any of the modules. When S=(l, 0, 0) such that any processor can handle_anyJnput message and ' 1 

which corresponds to M(l) in overload level 1, but M(2) and any cal Lprocessot can Jiandk aaYjoutpuLmessage from a \ 

C(l) in no overload, A=(0, 0, 0) corresponding to no action 45 message processor. For the autonomous case, each processor 

at any of the modules. HeuristicaUy, this is because a slight simply calcul ates, its own load and based on that load; 1 

change in the state of one module is assumed a perturbation derives an overload level which level specifies a correspond- \ 

which does not require taking an action unless it becomes ing group of overload control actions. In contrast, for the j 

more serious, when S=(l, 1, 0^ only module M(l) takes integrated system case, each processor still calculates its 

action "1". This action may correspond to, for example, 50 own overload level, but transmits this level either in terms j 

deferring O, A and M operation. of an overload number, or as an overload level, to a | 

The measurement methods used in integrated approach is centralized processor. The centralized processor has the task \ 

identical to the autonomous/independent case. As in the of deciding what overload control level should, in fact, be 

autonomous/independent case, there are short-term and applied to each of the processors in the system. In one 

long-term intervals of measurement, and after departing the 55 embodiment, the centralized processor does not adjust the 

highest short-term overload level, a particular module enters overload control level of any processor unless at least one 

the lowest long-term overload level, etc. What makes a processor has a load that corresponds to a long-term over- 1 

difference here in the integrated case is that rather than load level. In that case, the actions described above for FIG. | 

deciding for an action in a particular module only based on 5 are executed only if one or more processors are in \ 

its own state, the decision is based on the collective set of 60 long-term overload. Alternatively, the actions described for \ 

states of all modules. Thus, in the example of previous table, FIG. 5 are executed even if none of the processors in the 

"0" refers to the lowest long-term state for M(l). In the table system has a load that corresponds to a long-term overload. , y 
when S=(2, 0, 0), corresponding to M(l) being in the lowest FIG. 6 illustrates a less general method, but one which nasi/ 

long term overload, while M(2) and C(l) are in no overload, the advantage that the amount of computation required in the 

the corresponding action is indicated as A=(2, 1, 0). This 65 centralizeo^roce^oris sharply reduced. The basic philoso- j 

action corresponds, for example, to blocking calls in M, at phy of thTmethod oT FIG . 6 is that the effect of overload of #^ 

a given rate (Say 5%), and at the same time deferring other professors can be simplified by considering only the 
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"average" overloa d of each fractional group of processors^ 
i.e., the MPs, the CPs, and theNs. The basic approach use~ 
two meTEodsoflm^^^ 

overload level of a particular processor. First, if t he procesO 
sor has an overload level that is far removed^ from the- \ 5 
oyeriolcPlevels of- the other processor s in the group. >tne" if 
action level is adjusted up or down toward the overload level *} \ 
of the l pro cessors of the gro up. Second, if the "average" 
overload level of the group differs substantially from the, 
overload level of an adjacent group, t he action level of the 
members of the grou p is adjus ted toward the.overload le vel 
o f the adjacent group . Effectively, both types of adjustments 
serve to "flatten" the action levels both horizontally (within 
a group) and vertically (from group to group). " 

This is illustrated in FIG. 6. The centralizedjocessor 
receives reports 0 f the overlo ad state of each processor 
(Action Block 601). It is desirable that this report only be 
sent where a change has occurred. The centralized processor 
derives a n "average" overload level for the group . This 
"average" can.be-weighted toward the higher overload stat es 
within the group since the gaps inoyerl oad are n ot neces- 
sarily equallnau^verloadstate transitions. The "average" 
overload level also need not be an integer, but can include 
fractions. T hen two types of adj ustments are made to the 
o verload levels of each processor to derive the action level 
foT that processor. If there is a substantial difference, Le.„ 
more than a pred etermined amoun t, weighted appropria tely 
to ward the higher overioad s tates, and a djusted experimeri - / 
tally in a system, then t he average action level of a processo r j 
group is adjusted toward the higher of the o verload levels o f j 
the adjacent processor groups. Next, if the overioad levels or* 
the processor differs by more than a second appropriatel y 
weighted predeterm ined amoun t from the average overload 
slate of the processor, the action state of that processor (is 
adjusted toward th e average . 1 ne adju stments can be frac - i 
tional and cumulative, and are rounded to a an integral \ 
adjustment of the action level as compared to the overload • 
level. In general, the adjustment is unlikely to be greater than 1 
a single step between the action level and the previous level, j 
The new control action level is then transmitted to each 
processor whose action state has changed (Action Block 
609). Note that for the integrated control case, the number of 
control action levels may be greater than the number of 
overload levels of each module. I 

For the case of non-homogeneous processors ^wuiima 
functional group, the state of the individual processors can 
be weighted in deriving the average of the group. For 
example, if one of the MPs is slower than the others, its 
overload state should be weighted less heavily in deriving 
the overload state of the group. 

The above description is of one embodiment of appli- 
cant's invention. Many other variations can be found by 
those of ordinary skill in the art The invention is only 
limited by the attached claims. 

What is claimed is: 

1. In a distributed real-time system comprising a plurality 
of modules, a method of responding to overioad comprising 
the steps of: 

in a central processor of said system, receiving load 
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indications from each module of said system; 
responsive to receipt of said load indications, in said 
central processor, deriving a control action for each of 
said modules; and 
said central processor informing each of said modules of 
its control action; 

wherein said load indications comprise long-term and 
short term overload levels; and 



wherein said long-term overload levels are based on 
measurements taken over a longer period of time 
than said short-term overload levels. 

2. The method of claim 1 wherein said central processor 
begins to inform said modules of requested control actions 
if at least one of the modules is in a long-term overload level, 
and wherein each of the modules autonomously derives a 
control action if no requested control actions are received 
from said central processor. 

3. In a distributed real-time system comprising a plurality 
of modules, a method of responding to overload comprising 
the steps of: 

in a central processor of said system, receiving load 

indications from each module of said system; 
responsive to receipt of said load indications,^ in said 

central processor, deriving a control action for each of 

said modules; and 
said central processor informing each of said modules of 

its control action; 

wherein said central processor derives an average over- 
load level ^orxai^group of modules; 

wherein said modules are grouped functionally into a 
plurality of groups; and 

wherein the control action requested for a particular 
module is based on the overioad level of the par- 
ticular module, the average overload level of a group 
to which the particular module belongs, and the 
average overload levels of groups adjacent to the 
group to which the particular module belongs. 

4. The method of claim 1, wherein said central processor 
derives an average overload level for each group of modules, 
wherein said modules are grouped functionally into a plu- 
rality of groups, and wherein the control action requested for 
a particular module is based on. the overload, level- of the 
particular module, the average overload level of a group to 
which the particular module belongs, and the average over- 
load levels of groups adjacent to the group to which the 
particular module belongs. 

5. In a distributed real time system comprising a plurality 
of modules, a method of responding to overload comprising 
the steps of: 

in a central processor of said system, receiving load 



indications from each module of said system; 
responsive to receipt of said load indications, deriving a 

control action for each of said modules; and 
informing each of said modules of its control action; 
wherein said load indications comprise long-term and 
short term overload levels and wherein said long- 
term overload levels are based on measurements 
taken over a longer period of time than said short- 
term overload levels. 

6. The method of claim 5, wherein said central processor 
begins to inform said modules of requested control actions 
if at least one of the modules is in a long-term overload level, 
and wherein each of the modules autonomously derives a 
control action if no requested control actions are received 
from said central processor. 

7. The method of claim 5, wherein said central processor 
60 derives an averag e overload lev el for each group of modules, 

wherein said modules are grouped functioaally into_aj 
rality of groups, and wherein the control action requested for 
a particular mo dule is based on th e overload level of the 
particular module, the average overload level of a group to ^ 
65 which the particular modul e belo ngs, and the average over- 
load levels of groups adjacent to the group t o which the 
particular module belongs. 
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8. In a distributed real time system comprising a plurality indications from modules of at least two of said at 
of modules, a method of responding to overload comprising least two of said groups of functionally different 
the steps of: modules. 

in a central processor of said system, receiving load 10. The method of claim 1, wherein said plurality of 

indications from each module of said system; 5 modules comprises at least two groups of functionally 

responsive to receipt of said load indications, deriving a different modules, and wherein said central processor 

control action for each of said modules; and derives a control action for ones of said plurality of modules 

informing each of said modules of its control action; bascd u P° n load indications from modules of at least two of 

wherein said central processor derives an average over- 1Q ^J* 1 least two of of fictionally different 

load level for each group of modules; m( *~^' . 

wherein said modules are grouped functionally into a method of ckun 2 ' whercm ssud P luraht y of 

plurality of groups; modules comprises at least two groups of functionally 

wherein the control action requested for a particular different modules, and wherein said central processor 

module is based on the overload level of the par- denves a control action for ones of said plurahty of modules 

ticular module, the average overload level of a group based upon load indications from modules of at least two of 

to which the particular module belongs; and said at least «»o of said groups of functionally different 

the average overload levels of groups adjacent to the m( ^ 1 ^ t 

group to which the particular module belongs. U ^ method of claim 5 ' wherein **** P luralit y of 

9. In a distributed real-time system comprising a plurality nn ™° dules prises at least two groups of functionally 

of modules, a method of responding to overload comprising 20 *K? Kni modules, and wherein said central processor 

the steps of: ~ denves a control action for ones of said plurality of modules 

" - ., . . , , based upon load indications from modules of at least two of 

in a central processor of said system, receiving toad ^ „ , east two of ^ of fttaaioa ^ y 

indications from each module of said system; modules 

responsive to receipt of said load indications, in said 25 13. The method of claim 6, wherein said plurality of 

central processor, deriving a control action for each of modules comprises at least two groups of functionally 

said modules; and different modules, and wherein said central processor 

said central processor informing each of said modules of derives a control action for ones of said plurality of modules 

its control action; based upon load indications from modules of at least two of 

wherein said plurality of modules comprises at least 30 said at least two of said groups of functionally different 

two groups of functionally different modules; and modules, 
wherein said central processor derives a control action 

for ones of said plurality of modules based upon load * * * * * 
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